DPRESS: Localizing estimates of predictive uncertainty
نویسنده
چکیده
BACKGROUND The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object u: the standard error of prediction su can be estimated as the non-cross-validated error st* for the closest object t* in the training set adjusted for its separation d from u in the descriptor space relative to the size of the training set.The predictive uncertainty factor gammat* is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: Distributed PRedictive Error Sum of Squares (DPRESS). Note that st* and gammat*are characteristic of each training set compound contributing to the model of interest. RESULTS The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (N = 75) drawn from a large (N = 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so. CONCLUSION DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, a posteriori approach to defining applicability domains in terms of localized uncertainty.
منابع مشابه
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Deep neural networks (NNs) are powerful black box predictors that have recently achieved impressive performance on a wide spectrum of tasks. Quantifying predictive uncertainty in NNs is a challenging and yet unsolved problem. Bayesian NNs, which learn a distribution over weights, are currently the state-of-the-art for estimating predictive uncertainty; however these require significant modifica...
متن کاملUncertainty in QSAR predictions.
It is relevant to consider uncertainty in individual predictions when quantitative structure-activity (or property) relationships (QSARs) are used to support decisions of high societal concern. Successful communication of uncertainty in the integration of QSARs in chemical safety assessment under the EU Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) system can be f...
متن کاملIncremental explosive analysis and its application to performance-based assessment of stiffened and unstiffened plates
In this paper, the dynamic behavior of square plates with various thicknesses and stiffening configurations subjected to underwater explosion (UNDEX) are evaluated through a relatively novel approach which is called Incremental Explosive Analysis (IEA). The IEA estimates the different limit-states and deterministic assessment of plats’ behavior, considering uncertainty of loading conditions and...
متن کاملThe Strategic Importance of Predictive Uncertainty in Conjoint Design
Even if firms have precise partworth estimates, predictive uncertainty (which arises due to randomness in customer behavior and attributes not included in the conjoint study) has strategic implications. Firms make strategic errors that reduce profits if they overestimate or underestimate predictive uncertainty. If firms use low-quality market research and thus overestimate uncertainty in predic...
متن کاملComparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches
This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...
متن کامل